I am the Associate Director for the Master of Science in Data Science for Public Policy program (DSPP) and an assistant teaching professor in the McCourt School of Public Policy at Georgetown University. I earned my Ph.D. from the University of Maryland, College Park where I worked under the direction of Johanna Birnir, David Cunningham, Kathleen Cunningham, and Ernesto Calvo.
My research examines the distribution and impact of political violence perpetrated by non-state organizations to isolate plausible policy interventions targeted at reducing the occurrence and spread of conflict. Ongoing projects explore the effect of membership heterogeneity on the strategic use of violent tactics by armed actors; integration of multiple conflict event datasets to improve measurement of violent activity; and the use of live simulated environments to analyze normative behavior.
As a computational social scientist, I develop, utilize, and teach computational tools to help (a) effectively utilize machine learning and computational methods to draw descriptive inferences from data and (b) leverage non-traditional data assets to better understand social processes.
Disaggregated studies of conflict typically rely on a single dataset to make inferences. In this project, we advocate integrating multiple datasets to improve measurement and analysis.
This project examines why some violent non-state actors experiment with and develop a broad repertoire of tactics and targets to achieve their political aims while other groups consistently utilize the same methods across their lifespan.
We leverage live simulated environments to examine individual and group-level normative and strategic behavior.
We explore structural breaks in diplomatic meeting networks as a predictor for shifts in foreign policy.
The growing multitude of sophisticated event-level data collection enables novel analyses of conflict. Even when multiple event data sets are available, researchers tend to rely on only one. We instead advocate integrating information from multiple event data sets. The advantages include facilitating analysis of relationships between different types of conflict, providing more comprehensive empirical measurement, and evaluating the relative coverage and quality of data sets. Existing integration efforts have been performed manually, with significant limitations. Therefore, we introduce Matching Event Data by Location, Time and Type (MELTT) — an automated, transparent, reproducible methodology for integrating event data sets. For the cases of Nigeria 2011, South Sudan 2015, and Libya 2014, we show that using MELTT to integrate data from four leading conflict event data sets (Uppsala Conflict Data Project–Georeferenced Event Data, Armed Conflict Location and Event Data, Social Conflict Analysis Database, and Global Terrorism Database) provides a more complete picture of conflict. We also apply multiple systems estimation to show that each of these data sets has substantial missingness in coverage.
Do firm founders from nations with more predictable and transparent institutions allocate more autonomy to their employees? A cultural imprinting view suggests that institutions inculcate beliefs that operate beyond the environment in which those beliefs originate. We leverage data from a multiplayer online role-playing game, EVE Online, a setting where individuals can establish and run their own corporations. EVE players come from around the world, but all face the same institutional environment within the game. This setting allows us to disentangle, for the first time, cultural norms from the myriad other local factors that will influence organizational design choices across nations. Our main finding is that founders residing in nations with more predictable and transparent real world institutions delegate more authority within the virtual firms they create.
Does greater ethnic inclusion into the executive have a positive effect on a country’s economic development? We posit that by allowing for greater diversity in a state’s decision-making process, ethnic populations find their preferences represented and thus are more likely to support enacted policies; at the same time the quality of the policy increases as a greater variety of perspectives are introduced. Utilizing the new AMAR (All-Minorities at Risk) data to capture ethnic diversity, this article offers a preliminary description, suggesting that higher levels of inclusion positively correlate with indicators of economic growth.
This chapter offers insight into the utility of the latest release of Uppsala Conflict Data Program’s Georeferenced Event Dataset (UCDP-GED). The UCDP has an established record of compiling and disseminating an array of widely used data resources. The field of conflict studies, and the data that contributing scholars collect, have progressively moved toward greater specificity along several dimensions. UCDP-GED records the category of violence, the actors involved, the location and associated coordinates, and the timing of each event, as well as other characteristics. UCDP has been the source of the most widely used data in academic research on violence committed by organized armed actors. In particular, UCDP-GED provides a means for analyses to test micro-level theories. UCDP-GED has paved the way for methodological advances with a major bearing on substantive contributions to the literature.
2019 “Where a Founder Is from Affects How They Structure Their Company” (with David Waguespack and Johanna K. Birnir). Harvard Business Review.
MELTT: Merging Event Data by Location, Time, and Type: An R package that offers a methodology for systematically integrating disparate geospatial event data by leveraging information on spatio-temporal co-occurrence and event-specific metadata.
I primarily teach graduate-level computational social science courses at Georgetown University. As an instructor, I try to balance substance with methodological rigor by training students how to effectively employ computational methods to investigate, analyze, and learn from data to formulate and test theoretically-relevant hypotheses. In my instruction, I match formal computational training with hands-on empirical examples so that quantitative methods are taught in the context where they are applied.
I aim to train students on how to: (i) utilize machine learning methods to explore and generate hypotheses from data; (ii) design and implement statistical designs geared toward effectively inferring causal relationships from observational and experimental data; (iii) synthesize disparate and unstructured data to draw meaningful insights from data related to public policy and political science inquiries; and (iv) visualize data to effectively communicate empirical findings. My goal is to train students to be effective consumers, critics, and producers of computational social science.
Course taught: Spring 2019, Spring 2020
This is the second course in the two-course sequence on quantitative methods for social science for the Masters of Science in Data Science for Public Policy (DSPP). The course builds on students’ understanding of multivariate regression and introduces advanced, but commonly used, methods of statistical analysis. The course is broadly divided into two part: advanced modeling and causal inference. Instruction will concentrate on how to determine the appropriate econometric approach in addressing various types of policy questions, while highlighting the challenges in isolating causal effects. The emphasis is on applied learning; formal proofs and mathematical rigor are presented but not the principal focus of the course. As part of our effort to teach effective communication skills, students will make presentations about applications using the techniques being studied in class.
Course taught: Fall 2018, Fall 2019
This first course in the core data science sequence for the Masters of Science in Data Science for Public Policy (DSPP) introduces students to the programming and mathematical concepts that underpin statistical learning. The aim of the course is to provide DSPP students with the foundations necessary to grasp the concepts and algorithms encountered in Data Science II and III. Students will cover topics related to linear algebra (with a focus on linear regression and dimension reduction); multivariate calculus (with an emphasis on optimization algorithms, specifically gradient descent); and probability theory (with an emphasis on simulation and sampling). Throughout the course, students will be introduced to the fundamentals of programming and manipulating data in Python. Students will work in Jupyter notebooks and use Git/GitHub to submit coding assignments, developing literate programming and reproducible research skills they will use throughout the program.
Course taught: Spring/Fall 2019, Spring 2020
This course teaches Masters of Public Policy (MPP) students how to synthesize disparate, possibly unstructured data in order to draw meaningful insights from data. Topics covered include fundamentals of functional programming in
R, literate programming, data wrangling, data visualization, data extraction (via web scraping and APIs), text analysis, and machine learning methods. In addition, students will be exposed to Git and Github for reproducible research. The course aims to offer students a practical toolkit for data exploration. The objective of the course is to equip MPP students with the skills to incorporate data into their decision-making and analysis.
I advise thesis projects for students in the Masters of Conflict Resolution program at Georgetown University.
R”. (Data Science in Action Seminar) McCourt School of Public Policy, Georgetown UniversityR” (Short Course) Smith School of Business, University of Maryland, College ParkR: A short course on processing, analyzing, and visualizing data in R” (Short Course) Creative Associates International, Washington DCR” (Talk) University of Maryland, College ParkR” (Workshop) University of Iceland, ReykjavikR” (Workshop) University of Maryland, College ParkR” (Workshop) University of Maryland, College Park